Updates
!pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.9/site-packages (5.8.0) Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.9/site-packages (from plotly) (8.0.1)
%%html
<style>
.output_wrapper .output {
overflow-y: visible;
height: fit-content;
}
</style>
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
from IPython.display import Markdown as md, display
# DISPLAY SETTINGS
pd.set_option('display.max_rows', None)
# IMPORT DFs
# admissions
admissions_df = pd.read_csv('./dataupload/9831/caboodle_patient_hospital_admissions.csv')[['project_id','encounter_key','hospital_service','start_datetime','end_datetime']]
admissions_df['start_datetime'] = pd.to_datetime(admissions_df['start_datetime'])
# diagnoses
df_main = pd.read_csv("./dataupload/9830/caboodle_patient_diagnoses.csv")[['project_id','start_datetime','diag_local_code','encounter_key','diag_name','primary_diagnoses']]
df_main['start_datetime'] = pd.to_datetime(df_main['start_datetime'])
# # REMOVE PATIENTS WHO HAVE NEVER BEEN DIAGNOSED WITH ANXIETY CODE
# ANXIETY_CODES = ['F410','F411','F412','F413','F418','F419',]
# anxiety_check = df_main.groupby(['project_id']).apply(lambda x: x['diag_local_code'].isin(ANXIETY_CODES).any())
# # LIST OF PTS NEVER DIAGNOSED WITH ANXIETY
# never_anxiety_dx = list(anxiety_check.where(lambda x: x==False).dropna().index)
# print(f"Patients never received anxiety diagnosis: {never_anxiety_dx if never_anxiety_dx else 'NONE'}")
# ADD COLUMNS FOR DIAGNOSIS HIERARCHIES
df_main['diag_level_1'] = df_main['diag_local_code'].str[0]
df_main['diag_level_2'] = df_main['diag_local_code'].str[0:2]
df_main['diag_level_3'] = df_main['diag_local_code'].str[0:3]
# GRAB ANXIETY CODES
ANXIETY_CODES = list(df_main[df_main['diag_local_code'].str.contains('F41').fillna(False)]['diag_local_code'].value_counts().index)
# procedures
proc_df = pd.read_csv("./dataupload/10035/caboodle_patient_procedures.csv")
proc_df['start_datetime'] = pd.to_datetime(proc_df['start_datetime'])
proc_df['end_datetime'] = pd.to_datetime(proc_df['end_datetime'])
# meds
orders_df = pd.read_csv("./dataupload/10054/caboodle_patient_selected_medication_orders_antidepreshypnoticsanxiolytics.csv")
comeds_df = pd.read_csv("./dataupload/10036/caboodle_patient_selected_medication_comeds_hypnoticsanxiolytics.csv")
admins_df = pd.read_csv("./dataupload/10052/caboodle_patient_selected_medication_admins_antidepreshypnoticsanxiolytics.csv")
admins_df['start_datetime'] = pd.to_datetime(admins_df['start_datetime'])
admins_df[['drug_name','dose']] = admins_df['drug_name'].str.split('\s+(?=\d)', expand=True)
# MAKE DICT OF DRUG AND DRUG CLASS
temp_orders_df = orders_df.copy()
temp_orders_df[['drug_name','dose']] = temp_orders_df['drug_name'].str.split('\s+(?=\d)', expand=True)
class_name = dict(zip(temp_orders_df['drug_name'], temp_orders_df['drug_pharmaceutical_class_name']))
subclass_name = dict(zip(temp_orders_df['drug_name'], temp_orders_df['drug_pharmaceutical_subclass_name']))
admins_df['drug_class'] = admins_df['drug_name'].replace(class_name)
admins_df['drug_subclass'] = admins_df['drug_name'].replace(subclass_name)
display(md('**Diagnoses**'))
display(df_main.head())
display(md('**Specialties**'))
display(admissions_df.head())
display(md('***Medications***'))
display(md('*Orders*'))
display(orders_df.head())
display(md('*Co-meds*'))
display(comeds_df.head())
display(md('*Administrations*'))
display(admins_df.head())
display(md('**Procedures**'))
display(proc_df.head())
pd.set_option('display.max_columns', 500)
Diagnoses
| project_id | start_datetime | diag_local_code | encounter_key | diag_name | primary_diagnoses | diag_level_1 | diag_level_2 | diag_level_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | N185 | #### | Chronic kidney disease, stage 5 | 0 | N | N1 | N18 |
| 1 | PR-######## | ####-##-## | Q623 | #### | Other obstructive defects of renal pelvis and ... | 0 | Q | Q6 | Q62 |
| 2 | PR-######## | ####-##-## | Z936 | #### | Other artificial openings of urinary tract status | 0 | Z | Z9 | Z93 |
| 3 | PR-######## | ####-##-## | N185 | #### | Chronic kidney disease, stage 5 | 0 | N | N1 | N18 |
| 4 | PR-######## | ####-##-## | R520 | #### | Acute pain | 0 | R | R5 | R52 |
Specialties
| project_id | encounter_key | hospital_service | start_datetime | end_datetime | |
|---|---|---|---|---|---|
| 0 | PR-######## | #### | Nephrology | ####-##-## ##:##:## | ####-##-## ##:##:## |
| 1 | PR-######## | #### | Urology | ####-##-## ##:##:## | ####-##-## ##:##:## |
| 2 | PR-######## | #### | Urology | ####-##-## ##:##:## | ####-##-## ##:##:## |
| 3 | PR-######## | #### | CAMHS | ####-##-## ##:##:## | ####-##-## ##:##:## |
| 4 | PR-######## | #### | Dermatology | ####-##-## ##:##:## | ####-##-## ##:##:## |
Medications
Orders
| project_id | start_datetime | end_datetime | MedicationOrderKey | ordered_datetime | medication_order_id | sequence_number | drug_code | drug_name | drug_gpi | ... | indication_comments | response | medication_order_name | medication_order_mode_name | medication_order_class_name | medication_order_source_name | disps_this_period | admins_this_period | first_admin_datetime | encounter_key | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## ##:##:## | ####-##-## ##:##:## | #### | ####-##-## ##:##:## | #### | 1 | 180811.0 | DIAZEPAM 2MG/5ML ORAL SOLUTION SUGAR-FREE | #### | ... | NaN | NaN | diazePAM 2mg/5ml SF oral solution 4 mg | Inpatient | Normal | Manage Orders | 0 | 0 | NaN | #### |
| 1 | PR-######## | ####-##-## ##:##:## | ####-##-## ##:##:## | #### | ####-##-## ##:##:## | #### | 1 | 180811.0 | DIAZEPAM 2MG/5ML ORAL SOLUTION SUGAR-FREE | #### | ... | NaN | NaN | diazePAM 2mg/5ml SF oral solution 4 mg | Inpatient | Normal | Manage Orders | 0 | 1 | ####-##-## ##:##:## | #### |
| 2 | PR-######## | ####-##-## ##:##:## | ####-##-## ##:##:## | #### | ####-##-## ##:##:## | #### | 1 | 180811.0 | DIAZEPAM 2MG/5ML ORAL SOLUTION SUGAR-FREE | #### | ... | NaN | NaN | diazePAM 2mg/5ml SF oral solution 4 mg | Inpatient | Normal | Manage Orders | 0 | 1 | ####-##-## ##:##:## | #### |
| 3 | PR-######## | ####-##-## ##:##:## | ####-##-## ##:##:## | #### | ####-##-## ##:##:## | #### | 1 | 180811.0 | DIAZEPAM 2MG/5ML ORAL SOLUTION SUGAR-FREE | #### | ... | NaN | NaN | diazePAM 2mg/5ml SF oral solution 2 mg | Inpatient | Normal | Verify Orders | 0 | 0 | NaN | #### |
| 4 | PR-######## | ####-##-## ##:##:## | NaN | #### | ####-##-## ##:##:## | #### | 1 | 432.0 | AMITRIPTYLINE 10MG TABLETS | #### | ... | NaN | NaN | amitriptyline 10mg tablet | Outpatient | TTO | OP Visit Taskbar | 1 | 0 | NaN | #### |
5 rows Χ 34 columns
Co-meds
| project_id | admin_date | drugs | |
|---|---|---|---|
| 0 | PR-######## | ####-##-## | DIAZEPAM |
| 1 | PR-######## | ####-##-## | DIAZEPAM |
| 2 | PR-######## | ####-##-## | DIAZEPAM |
| 3 | PR-######## | ####-##-## | DIAZEPAM |
| 4 | PR-######## | ####-##-## | DIAZEPAM |
Administrations
| project_id | medication_order_id | MedicationOrderKey | start_datetime | end_datetime | prn_dose | drug_code | drug_name | admin_seq | intended_frequency | route_name | medication_order_mode_name | sched_admin_datetime | dose | drug_class | drug_subclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | PR-######## | #### | #### | ####-##-## ##:##:## | ####-##-## ##:##:## | 4.00 mg | 180811.0 | DIAZEPAM | 1 | Once only | Oral | Inpatient | ####-##-## ##:##:## | 2MG/5ML ORAL SOLUTION SUGAR-FREE | Hypnotics and anxiolytics | Hypnotics |
| 1 | PR-######## | #### | #### | ####-##-## ##:##:## | ####-##-## ##:##:## | 4.00 mg | 180811.0 | DIAZEPAM | 1 | Every 6 hours PRN | Oral | Inpatient | ####-##-## ##:##:## | 2MG/5ML ORAL SOLUTION SUGAR-FREE | Hypnotics and anxiolytics | Hypnotics |
| 2 | PR-######## | #### | #### | ####-##-## ##:##:## | ####-##-## ##:##:## | 2.00 mg | 2404.0 | DIAZEPAM | 1 | THREE times a day | Oral | Inpatient | ####-##-## ##:##:## | 2MG TABLETS | Hypnotics and anxiolytics | Hypnotics |
| 3 | PR-######## | #### | #### | ####-##-## ##:##:## | ####-##-## ##:##:## | 2.00 mg | 2404.0 | DIAZEPAM | 2 | THREE times a day | Oral | Inpatient | ####-##-## ##:##:## | 2MG TABLETS | Hypnotics and anxiolytics | Hypnotics |
| 4 | PR-######## | #### | #### | ####-##-## ##:##:## | ####-##-## ##:##:## | 4.00 mg | 2404.0 | DIAZEPAM | 1 | At Night | Oral | Inpatient | ####-##-## ##:##:## | 2MG TABLETS | Hypnotics and anxiolytics | Hypnotics |
Procedures
| project_id | start_datetime | end_datetime | sequence_number | epic_procedure_source_code | epic_procedure_source_name | principal_procedure | encounter_key | proc_code_set | proc_nat_code | proc_local_code | proc_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | ####-##-## | 0 | General Procedure | General Procedure | 0 | #### | Custom | POC118 | POC118 | POCT BLOOD GAS: ABL90 |
| 1 | PR-######## | ####-##-## | ####-##-## | 0 | General Procedure | General Procedure | 0 | #### | Custom | LAB2709 | LAB2709 | UE+BONE+CO2 |
| 2 | PR-######## | ####-##-## | ####-##-## | 0 | General Procedure | General Procedure | 0 | #### | NaN | LAB8348 | LAB8348 | LEGACY: AKI ALERT |
| 3 | PR-######## | ####-##-## | ####-##-## | 0 | General Procedure | General Procedure | 0 | #### | Custom | LAB293 | LAB293 | FULL BLOOD COUNT WITH WBC DIFFERENTIAL |
| 4 | PR-######## | ####-##-## | ####-##-## | 0 | General Procedure | General Procedure | 0 | #### | Custom | LAB876 | LAB876 | TACROLIMUS LEVEL (FK506) |
# PRELIMNARY EDA
df = df_main.copy()
# FILTER JUST FOR ANXIETY CODES
df = df[df['diag_level_3']=='F41']
# FILTER AFTER JULY 2019
df = df[df['start_datetime']>pd.to_datetime('2019-07-01')]
# DROP duplicates
display(f"Length of dataframe before dropping duplicates = {df.shape[0]}")
df = df.drop_duplicates(['project_id','diag_local_code'])
display(f"Length of dataframe before dropping duplicates = {df.shape[0]}")
df_diag_freq = pd.DataFrame(df.groupby(pd.Grouper(key='start_datetime',freq='M'))['diag_local_code'].count().reset_index(name='count'))
display(df.head())
display(df_diag_freq.sort_values(by='start_datetime').head())
# PLOT NUMBER OF DIAGNOSES OVER TIME
fig_total_dx_over_time = px.scatter(df_diag_freq,
x='start_datetime',
y='count',
title='Monthly Total F41X Anxiety Diagnoses over time (July 2019-Present)<br><sup>Removed duplicate diagnoses per patient</sup>',
# labels={
# 'x': 'Date',
# 'index': 'Count (n)'
# },
trendline='lowess',
)
lockdown_date_1 = pd.to_datetime("2020-03-26")
lockdown_date_2 = pd.to_datetime("2020-11-05")
lockdown_date_3 = pd.to_datetime("2021-02-06")
for date in [lockdown_date_1,lockdown_date_2,lockdown_date_3]:
fig_total_dx_over_time.add_vline(
x=date,
line_dash="dot",
# annotation_text="First Lockdown",
# annotation_position='top right',
)
# fig_total_dx_over_time.update_layout(
# margin_b=90, #increase the bottom margin to have space for caption
# annotations = [dict(xref='paper',
# yref='paper',
# x=1, y=-0.25,
# showarrow=False,
# text ='This is my caption for the Plotly figure')]
# )
fig_total_dx_over_time.show()
'Length of dataframe before dropping duplicates = 9275'
'Length of dataframe before dropping duplicates = 1695'
| project_id | start_datetime | diag_local_code | encounter_key | diag_name | primary_diagnoses | diag_level_1 | diag_level_2 | diag_level_3 | |
|---|---|---|---|---|---|---|---|---|---|
| 187 | PR-######## | ####-##-## | F419 | #### | Anxiety disorder, unspecified | 1 | F | F4 | F41 |
| 262 | PR-######## | ####-##-## | F418 | #### | Other specified anxiety disorders | 0 | F | F4 | F41 |
| 392 | PR-######## | ####-##-## | F419 | #### | Anxiety disorder, unspecified | 0 | F | F4 | F41 |
| 637 | PR-######## | ####-##-## | F411 | #### | Generalized anxiety disorder | 0 | F | F4 | F41 |
| 736 | PR-######## | ####-##-## | F419 | #### | Anxiety disorder, unspecified | 0 | F | F4 | F41 |
| start_datetime | count | |
|---|---|---|
| 0 | ####-##-## | 88 |
| 1 | ####-##-## | 41 |
| 2 | ####-##-## | 43 |
| 3 | ####-##-## | 62 |
| 4 | ####-##-## | 52 |
# SHOW NUMBER DIAGNOSES PER MONTH
df_per_diag = df_main.copy()
# FILTER JUST FOR ANXIETY CODES
df_per_diag = df_per_diag[df_per_diag['diag_level_3']=='F41']
# FILTER AFTER JULY 2019
df_per_diag = df_per_diag[df_per_diag['start_datetime']>pd.to_datetime('2019-07-01')]
# DROP duplicates
display(f"Length of dataframe before dropping duplicates = {df_per_diag.shape[0]}")
df_per_diag = df_per_diag.drop_duplicates(['project_id','diag_local_code'])
display(f"Length of dataframe before dropping duplicates = {df_per_diag.shape[0]}")
# df_per_diag = pd.DataFrame(df_per_diag.groupby('start_datetime')['diag_local_code'].value_counts().reset_index(name='count'))
df_per_diag = pd.DataFrame(df_per_diag.groupby(pd.Grouper(key='start_datetime',freq='M'))['diag_name'].value_counts().reset_index(name='count'))
# ADD IN PERCENTAGE NORMALISATION
df_per_diag['pct'] = 100* df_per_diag['count'] / df_per_diag.groupby('start_datetime')['count'].transform('sum')
# SCATTER PLOT NUMBER OF DIAGNOSES OVER TIME
fig_per_diag = px.bar(df_per_diag,
x='start_datetime',
y='count',
title='Stratified by anxiety code over time (July 2019-Present)<br><sup>Removed duplicate diagnoses per patient</sup>',
color='diag_name',
)
# PCT SCATTER PLOT NUMBER OF DIAGNOSES OVER TIME
fig_per_diag_pct = px.bar(df_per_diag,
x='start_datetime',
y='pct',
title='Percent Stratified by anxiety code over time (July 2019-Present)<br><sup>Removed duplicate diagnoses per patient</sup>',
color='diag_name'
)
display(df_per_diag.head())
fig_per_diag.show()
fig_per_diag_pct.show()
'Length of dataframe before dropping duplicates = 9275'
'Length of dataframe before dropping duplicates = 1695'
| start_datetime | diag_name | count | pct | |
|---|---|---|---|---|
| 0 | ####-##-## | Anxiety disorder, unspecified | 76 | 86.363636 |
| 1 | ####-##-## | Generalized anxiety disorder | 5 | 5.681818 |
| 2 | ####-##-## | Other specified anxiety disorders | 4 | 4.545455 |
| 3 | ####-##-## | Panic disorder [episodic paroxysmal anxiety] | 3 | 3.409091 |
| 4 | ####-##-## | Anxiety disorder, unspecified | 36 | 87.804878 |
# SHOW FREQUENCIES OF EACH DIAGNOSIS TOP 10
df_freq = df_main.copy()[['project_id','diag_local_code','diag_name','diag_level_3']]
df_freq = df_freq[df_freq['diag_level_3']=='F41']
# DROP duplicates
display(f"Length of dataframe before dropping duplicates = {df_freq.shape[0]}")
df_freq = df_freq.drop_duplicates(['project_id','diag_local_code'])
display(f"Length of dataframe before dropping duplicates = {df_freq.shape[0]}")
df_freq_diag = df_freq[['diag_name','diag_local_code']].value_counts().reset_index()
df_freq_diag.columns = ['diag_name','diag_local_code','Count']
df_freq_diag['diag'] = df_freq_diag['diag_name'] + " (" + df_freq_diag['diag_local_code'] + ")"
display(df_freq_diag)
fig_top_anx_diagnoses = px.bar(df_freq_diag,
x='diag',
y='Count',
color='diag',
title='Top Diagnoses<br><sup>Removed duplicate diagnoses per patient</sup>')
fig_top_anx_diagnoses.show()
'Length of dataframe before dropping duplicates = 9668'
'Length of dataframe before dropping duplicates = 1741'
| diag_name | diag_local_code | Count | diag | |
|---|---|---|---|---|
| 0 | Anxiety disorder, unspecified | F419 | #### | Anxiety disorder, unspecified (F419) |
| 1 | Other specified anxiety disorders | F418 | 584 | Other specified anxiety disorders (F418) |
| 2 | Panic disorder [episodic paroxysmal anxiety] | F410 | 46 | Panic disorder [episodic paroxysmal anxiety] (... |
| 3 | Generalized anxiety disorder | F411 | 32 | Generalized anxiety disorder (F411) |
| 4 | Mixed anxiety and depressive disorder | F412 | 9 | Mixed anxiety and depressive disorder (F412) |
It would be good to try and re-run this analysis, but only for diagnoses specifically tagged as primary, just too see what comes out
df_freq = df_main[(df_main['primary_diagnoses']==1) & (df_main['diag_level_3']=='F41')].copy()
# SHOW FREQUENCIES OF EACH DIAGNOSIS TOP 10
# DROP duplicates
display(f"Length of dataframe before dropping duplicates = {df_freq.shape[0]}")
df_freq = df_freq.drop_duplicates(['project_id','diag_local_code'])
display(f"Length of dataframe before dropping duplicates = {df_freq.shape[0]}")
df_freq_diag = df_freq[['diag_name','diag_local_code']].value_counts().reset_index()
df_freq_diag.columns = ['diag_name','diag_local_code','Count']
df_freq_diag['diag'] = df_freq_diag['diag_name'] + " (" + df_freq_diag['diag_local_code'] + ")"
display(df_freq_diag)
fig_top_anx_diag_primary = px.bar(df_freq_diag,
x='diag',
y='Count',
color='diag',
title='Top Anxiety Diagnoses where F41X is primary<br><sup>Removed duplicate diagnoses per patient</sup>')
fig_top_anx_diag_primary.show()
'Length of dataframe before dropping duplicates = 484'
'Length of dataframe before dropping duplicates = 255'
| diag_name | diag_local_code | Count | diag | |
|---|---|---|---|---|
| 0 | Anxiety disorder, unspecified | F419 | 223 | Anxiety disorder, unspecified (F419) |
| 1 | Other specified anxiety disorders | F418 | 19 | Other specified anxiety disorders (F418) |
| 2 | Generalized anxiety disorder | F411 | 8 | Generalized anxiety disorder (F411) |
| 3 | Panic disorder [episodic paroxysmal anxiety] | F410 | 5 | Panic disorder [episodic paroxysmal anxiety] (... |
# FULL DF
length_anxiety = df_main.copy()
# sort
length_anxiety = (
length_anxiety
.sort_values(by=['project_id','start_datetime'])
.drop_duplicates(['project_id','encounter_key','diag_local_code','start_datetime'])
)
length_anxiety = length_anxiety[['project_id','start_datetime','diag_local_code','diag_name']]
# GET just one patient for TEST DF
pts = [f'PR-########{num}' for num in range(9)]
# length_anxiety = length_anxiety[length_anxiety['project_id'].isin(length_anxiety.sample(10).project_id.unique())]
# length_anxiety = length_anxiety[length_anxiety['project_id'].isin(['PR-########','PR-########','PR-########'])]#.set_index(['project_id','start_datetime'])
# GROUPBY PROJECT ID, START_DATETIME -> CHECK IF ANY ANXIETY CODE PRESENT -> RETURN TRUE / FALSE
print('grouping df by project ID and start datetime and checking anxiety code present...')
length_anxiety['mask'] = length_anxiety['diag_local_code'].isin(ANXIETY_CODES)
length_anxiety_filtered = (
length_anxiety[['project_id','start_datetime','mask']]
.sort_values(['project_id','start_datetime','mask'])
.drop_duplicates(['project_id','start_datetime'],keep='last')
.rename(columns={'mask':'diag_local_code'})
.reset_index(drop=True)
)
print('Done!')
# FIX ANY ODD-LOOKING FALSE
# Shift to get row above and below
length_anxiety_filtered['tomoz_anx'] = length_anxiety_filtered.groupby(['project_id',]).apply(lambda x: x['diag_local_code'].shift(-1).fillna(False)).reset_index(drop=True)
length_anxiety_filtered['previous_anx'] = length_anxiety_filtered.groupby(['project_id',]).apply(lambda x: x['diag_local_code'].shift(1).fillna(False)).reset_index(drop=True)
length_anxiety_filtered['tomoz_diff'] = length_anxiety_filtered.groupby(['project_id',]).apply(lambda x: x['start_datetime'].shift(-1)).reset_index(drop=True)
length_anxiety_filtered['previous_diff'] = length_anxiety_filtered.groupby(['project_id',]).apply(lambda x: x['start_datetime'].shift(1)).reset_index(drop=True)
# function to do replacing -> SENSITIVITY SET TO 2 DAYS
def replace_false(df):
mask = ((df['tomoz_diff']-df['previous_diff']).dt.days < 3) & (df['diag_local_code'].eq(False)) & ((df['tomoz_anx'].eq(True))) & ((df['previous_anx'].eq(True)))
if bool(len(df[mask])):
df['diag_local_code'] = True
return df
# display('BEFORE APPLYING FALSE REPLACE', length_anxiety_filtered.set_index(['project_id','start_datetime']))
print('grouping by pt, replacing false')
length_anxiety_filtered = length_anxiety_filtered.groupby(['project_id']).apply(lambda x: replace_false(x))
# display('AFTER APPLYING FALSE REPLACE', length_anxiety_filtered.set_index(['project_id','start_datetime']))
print('Done!')
# GET DURATION OF DISCRETE GROUPS OF CONSECUTIVE TRUTHS
m = length_anxiety_filtered['diag_local_code']
dt = length_anxiety_filtered[m].groupby(['project_id', (~m).cumsum()])['start_datetime'].transform('first')
length_anxiety_filtered['duration'] = length_anxiety_filtered['start_datetime'].sub(dt).dt.days.fillna(0)
# display('DF AFTER CALC DURATION OF DISCRETE GROUPS OF CONSECUTIVE TRUTHS', length_anxiety_filtered.set_index(['project_id','start_datetime']))
# get just the cumulative duration rows by finding where diag_local_code is True followed by false
length_anxiety_filtered = length_anxiety_filtered[
(length_anxiety_filtered['diag_local_code'] == True)
&
(length_anxiety_filtered['tomoz_anx'] == False)
]
# add 1 day to each 0 anxiety day because it currently just finds duration of consecutive days, not actual duration anxiety
length_anxiety_filtered['duration'] = length_anxiety_filtered['duration'].astype(int)
length_anxiety_filtered['duration'] = length_anxiety_filtered['duration'].replace({0:1})
# display('df after finding rows which are true followed by False', 'Added 1 to 0 consecutive anxiety days to get actual duration of anxiety',length_anxiety_filtered.set_index(['project_id','start_datetime']))
# turn duration col back to datetime
length_anxiety_filtered['duration'] = pd.to_timedelta(length_anxiety_filtered['duration'],unit='d')
#get start date of anxiety col
length_anxiety_filtered['start_anx_date'] = (
length_anxiety_filtered['start_datetime']
-
length_anxiety_filtered['duration']
)
# GET FINAL DF
length_anxiety_filtered = length_anxiety_filtered.rename(columns={'start_datetime':'end_anx_date'}).drop(['diag_local_code','tomoz_anx'],axis=1)[['project_id','start_anx_date','duration','end_anx_date']]
length_anxiety_filtered.head(10)
grouping df by project ID and start datetime and checking anxiety code present... Done! grouping by pt, replacing false Done!
| project_id | start_anx_date | duration | end_anx_date | |
|---|---|---|---|---|
| 5 | PR-######## | ####-##-## | 6 days | ####-##-## |
| 13 | PR-######## | ####-##-## | 2 days | ####-##-## |
| 26 | PR-######## | ####-##-## | 4 days | ####-##-## |
| 33 | PR-######## | ####-##-## | 4 days | ####-##-## |
| 35 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 39 | PR-######## | ####-##-## | 21 days | ####-##-## |
| 42 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 45 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 50 | PR-######## | ####-##-## | 9 days | ####-##-## |
| 71 | PR-######## | ####-##-## | 1 days | ####-##-## |
length_anxiety_df = length_anxiety_filtered.copy()
# find freq distribution
length_anxiety_df_counts = length_anxiety_df['duration'].value_counts().reset_index(name='count').rename(columns={'index':'duration_anxiety'})
length_anxiety_df_counts['duration_anxiety'] = length_anxiety_df_counts['duration_anxiety'].dt.days
freq_dist_fig_anx_length = px.bar(length_anxiety_df_counts,
x='duration_anxiety',
y='count',
title='Frequency distribution of anxiety diagnoses (F41x) duration',
log_y=True,
log_x=False)
freq_dist_fig_anx_length.show()
# SAME FIG BUT WHERE COUNT > 20
freq_dist_fig_anx_length_top_20 = px.bar(length_anxiety_df_counts[length_anxiety_df_counts['count'].gt(20)],
x='duration_anxiety',
y='count',
title='Frequency distribution (n>20) of anxiety diagnoses (F41x) duration',
log_y=True,
log_x=False)
freq_dist_fig_anx_length_top_20.show()
display('Original df')
display(length_anxiety_df.head(10))
display('Frequency distribution duration values')
display(length_anxiety_df_counts.head(30))
'Original df'
| project_id | start_anx_date | duration | end_anx_date | |
|---|---|---|---|---|
| 5 | PR-######## | ####-##-## | 6 days | ####-##-## |
| 13 | PR-######## | ####-##-## | 2 days | ####-##-## |
| 26 | PR-######## | ####-##-## | 4 days | ####-##-## |
| 33 | PR-######## | ####-##-## | 4 days | ####-##-## |
| 35 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 39 | PR-######## | ####-##-## | 21 days | ####-##-## |
| 42 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 45 | PR-######## | ####-##-## | 1 days | ####-##-## |
| 50 | PR-######## | ####-##-## | 9 days | ####-##-## |
| 71 | PR-######## | ####-##-## | 1 days | ####-##-## |
'Frequency distribution duration values'
| duration_anxiety | count | |
|---|---|---|
| 0 | 1 | #### |
| 1 | 4 | 164 |
| 2 | 2 | 133 |
| 3 | 8 | 96 |
| 4 | 7 | 88 |
| 5 | 6 | 81 |
| 6 | 5 | 74 |
| 7 | 3 | 68 |
| 8 | 14 | 66 |
| 9 | 13 | 56 |
| 10 | 15 | 55 |
| 11 | 9 | 55 |
| 12 | 10 | 53 |
| 13 | 12 | 47 |
| 14 | 11 | 47 |
| 15 | 21 | 44 |
| 16 | 20 | 43 |
| 17 | 16 | 41 |
| 18 | 18 | 40 |
| 19 | 17 | 37 |
| 20 | 26 | 36 |
| 21 | 23 | 33 |
| 22 | 19 | 30 |
| 23 | 29 | 29 |
| 24 | 22 | 27 |
| 25 | 24 | 25 |
| 26 | 28 | 25 |
| 27 | 33 | 22 |
| 28 | 25 | 22 |
| 29 | 35 | 21 |
diag_df = length_anxiety_df.copy().reset_index(drop=True).drop('duration',axis=1)
adm_df = admins_df.copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########','########']]
# diag_df = diag_df[diag_df['project_id'].isin(test_pts)]
# adm_df = adm_df[adm_df['project_id'].isin(test_pts)]
######## TEST ###########
size_diag = diag_df.shape
size_adm = adm_df.shape
print(f"Size of diag {size_diag}.\nSize of administrations {size_adm}")
# ADM DF EDITS
# # ROLL DOWN DATETIME TO JUST DATE
adm_df['start_datetime'] = pd.to_datetime(adm_df['start_datetime']).dt.normalize()
# select cols
adm_df = adm_df[['project_id','start_datetime','drug_name','drug_class', 'drug_subclass']]
# rename cols
adm_df = adm_df.rename(columns={'start_datetime':'date_anx'})
display('Start diag df',diag_df.head())
diag_df = (diag_df.assign(date_anx = [pd.date_range(start, end)
for start, end
in zip(diag_df['start_anx_date'], diag_df['end_anx_date'])]
)
.explode('date_anx', ignore_index = True)
.drop(['start_anx_date','end_anx_date'],axis=1)
)
display(f'Diag df after getting dates between start and end {diag_df.shape}',diag_df.head())
display(f'Start admins df {adm_df.shape}',adm_df.head())
adm_diag = diag_df.merge(
adm_df,
how='left')
display(f'Df after outer merge {adm_diag.shape}',adm_diag.head())
# display(f'Df after outer merge {adm_diag.shape}',adm_diag)
adm_diag = adm_diag.fillna('no_drug').reset_index(drop=True)
# add month, year cols
adm_diag['month'] = adm_diag['date_anx'].dt.strftime('%b')
adm_diag['year'] = adm_diag['date_anx'].dt.strftime('%Y')
display(f'Final df, outer merge, convert NaN, add month, year cols {adm_diag.shape}',adm_diag.head())
# display(f'Final df, outer merge, dropped NaN {adm_diag.shape}',adm_diag)
Size of diag (7748, 3). Size of administrations (5670, 16)
'Start diag df'
| project_id | start_anx_date | end_anx_date | |
|---|---|---|---|
| 0 | PR-######## | ####-##-## | ####-##-## |
| 1 | PR-######## | ####-##-## | ####-##-## |
| 2 | PR-######## | ####-##-## | ####-##-## |
| 3 | PR-######## | ####-##-## | ####-##-## |
| 4 | PR-######## | ####-##-## | ####-##-## |
'Diag df after getting dates between start and end (1999393, 2)'
| project_id | date_anx | |
|---|---|---|
| 0 | PR-######## | ####-##-## |
| 1 | PR-######## | ####-##-## |
| 2 | PR-######## | ####-##-## |
| 3 | PR-######## | ####-##-## |
| 4 | PR-######## | ####-##-## |
'Start admins df (5670, 5)'
| project_id | date_anx | drug_name | drug_class | drug_subclass | |
|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 1 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 2 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 3 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 4 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
'Df after outer merge (2006770, 5)'
| project_id | date_anx | drug_name | drug_class | drug_subclass | |
|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | NaN | NaN | NaN |
| 1 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 2 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics |
| 3 | PR-######## | ####-##-## | NaN | NaN | NaN |
| 4 | PR-######## | ####-##-## | NaN | NaN | NaN |
'Final df, outer merge, convert NaN, add month, year cols (2006770, 7)'
| project_id | date_anx | drug_name | drug_class | drug_subclass | month | year | |
|---|---|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | no_drug | no_drug | no_drug | Apr | #### |
| 1 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics | Apr | #### |
| 2 | PR-######## | ####-##-## | DIAZEPAM | Hypnotics and anxiolytics | Hypnotics | Apr | #### |
| 3 | PR-######## | ####-##-## | no_drug | no_drug | no_drug | Apr | #### |
| 4 | PR-######## | ####-##-## | no_drug | no_drug | no_drug | Apr | #### |
def make_comorb_df(comorb_df = diag_df.copy(), main_df = df_main.copy(), level='diag_level_1'):
#deselect encounter_key col, primary_diagnoses
main_df = main_df.drop(['encounter_key','primary_diagnoses'],axis=1)
comorb_df = comorb_df.rename(columns={'date_anx':'start_datetime'})
# merge to just get days where F41x code present
comorb_df = comorb_df.merge(main_df)
# display(comorb_df.head(100).set_index(['project_id','start_datetime','diag_local_code']).sort_values(by=['project_id','start_datetime','diag_local_code']))
# # DROP duplicates
display(f"Length of dataframe before dropping duplicates = {comorb_df.shape[0]}")
comorb_df = comorb_df.drop_duplicates()
display(f"Length of dataframe after dropping duplicates = {comorb_df.shape[0]}")
# filter out F41x codes to just get comorb
comorb_df = comorb_df[comorb_df['diag_level_3'].ne('F41')]
# filter out any other codes
comorb_df = comorb_df[~comorb_df['diag_level_1'].isin(['Z','R'])]
# GRAB TOP 20 VALUE COUNTS OF CODES
comorb_df_top_20 = comorb_df[[level]].value_counts().head(20).reset_index(name='count').rename(columns={'index':level})
# ADD IN PERCENTAGE NORMALISATION
comorb_df_top_20['pct'] = (comorb_df_top_20['count'] / comorb_df_top_20['count'].sum())*100
# display(f'Top co-morbidities at {level}', comorb_df_top_20)
# PCT SCATTER PLOT NUMBER OF DIAGNOSES OVER TIME
fig_comorb_fig = px.bar(comorb_df_top_20,
x=level,
y='pct',
title=f'20 Most common co-morbidities per anxiety-diagnosed encounter at level {level}<br>(excluding codes: R, Z)',
color=level
)
return (fig_comorb_fig, comorb_df_top_20)
comborb_fig_diag_name = make_comorb_df(level='diag_name')[0]
comborb_fig_diag_name.show()
display(make_comorb_df(level='diag_name')[1])
comborb_fig_level_1 = make_comorb_df(level='diag_level_1')[0]
comborb_fig_level_1.show()
display(make_comorb_df(level='diag_level_1')[1])
'Length of dataframe before dropping duplicates = 4444998'
'Length of dataframe after dropping duplicates = 96940'
'Length of dataframe before dropping duplicates = 4444998'
'Length of dataframe after dropping duplicates = 96940'
| diag_name | count | pct | |
|---|---|---|---|
| 0 | Chronic kidney disease, stage 5 | #### | 9.836370 |
| 1 | Childhood autism | #### | 9.390108 |
| 2 | Chronic kidney disease, unspecified | 907 | 8.432503 |
| 3 | Constipation | 881 | 8.190777 |
| 4 | Developmental disorder of scholastic skills, u... | 684 | 6.359241 |
| 5 | Disorders of amino-acid transport | 553 | 5.141316 |
| 6 | Other disorders of kidney and ureter in other ... | 545 | 5.066939 |
| 7 | Secondary hypertension, unspecified | 522 | 4.853105 |
| 8 | Other specified disorders of cornea | 505 | 4.695054 |
| 9 | Cardiomegaly | 449 | 4.174414 |
| 10 | Acute lymphoblastic leukaemia [ALL] | 429 | 3.988472 |
| 11 | Noninfective gastroenteritis and colitis, unsp... | 403 | 3.746746 |
| 12 | Reserved for concepts with insufficient inform... | 385 | 3.579398 |
| 13 | Specific (isolated) phobias | 364 | 3.384158 |
| 14 | Gastro-oesophageal reflux disease without oeso... | 355 | 3.300483 |
| 15 | Other mucopolysaccharidoses | 354 | 3.291186 |
| 16 | Anaemia in other chronic diseases classified e... | 353 | 3.281889 |
| 17 | Other forms of herpesviral infection | 342 | 3.179621 |
| 18 | Kidney transplant failure and rejection | 329 | 3.058758 |
| 19 | Asthma, unspecified | 328 | 3.049461 |
'Length of dataframe before dropping duplicates = 4444998'
'Length of dataframe after dropping duplicates = 96940'
'Length of dataframe before dropping duplicates = 4444998'
'Length of dataframe after dropping duplicates = 96940'
| diag_level_1 | count | pct | |
|---|---|---|---|
| 0 | Q | #### | 10.877843 |
| 1 | K | #### | 10.137367 |
| 2 | F | #### | 9.054000 |
| 3 | N | #### | 8.141027 |
| 4 | E | #### | 8.082126 |
| 5 | I | #### | 8.029535 |
| 6 | M | #### | 7.922250 |
| 7 | H | #### | 6.565412 |
| 8 | G | #### | 6.403433 |
| 9 | J | #### | 4.270358 |
| 10 | D | #### | 4.200938 |
| 11 | C | #### | 3.813871 |
| 12 | L | #### | 3.773902 |
| 13 | T | #### | 2.408650 |
| 14 | B | 913 | 1.920609 |
| 15 | Y | 909 | 1.912195 |
| 16 | U | 709 | 1.491470 |
| 17 | A | 234 | 0.492248 |
| 18 | S | 147 | 0.309233 |
| 19 | P | 92 | 0.193533 |
top_meds = adm_diag.copy()
# PLOT OVERALL
overall_top = top_meds[top_meds['drug_name'].ne('no_drug')].drug_name.value_counts().reset_index()
overall_top.columns = ['drug','count']
overall_meds_fig = px.bar(overall_top,
x='drug',
y='count',
title='Overall frequency of anxiolytic/anti-depressant prescriptions for F41x patients',
color='drug',
log_y = False)
overall_meds_fig.show()
drug_time = adm_diag.copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########','########','########']]
# drug_time = drug_time[drug_time['project_id'].isin(test_pts)]
# adm_df = adm_df[adm_df['project_id'].isin(test_pts)]
# drug_type = 'drug_name'
######## TEST ###########
drug_class_type = ['drug_name','drug_class','drug_subclass'] # list for for loop
# filter after April 2019
drug_time = drug_time[
(drug_time['date_anx'] > pd.to_datetime('2019-04-01'))
]
# group datetime col into monthly
type_counts = (
drug_time.groupby(['project_id',pd.Grouper(key='date_anx',freq='M')])
['drug_name'].unique()
.explode()
.reset_index(name='drug_name')
.sort_values(by=['date_anx','project_id','drug_name'])
)
#TEST - shows that when grouped monthly, there are entries where the same patient received a drug and also didn't
# type_counts = type_counts[type_counts['date_anx'].isin(['2019-04-30','2019-05-31'])].sort_values(['date_anx','project_id','drug_name'])
# display(
# type_counts
# # .drop_duplicates('project_id')
# # .sort_values(by=['drug_name'])
# .reset_index(drop=True)
# )
# get total num patients in month
total_pts_per_month = type_counts.groupby(['date_anx'])['project_id'].count().reset_index(name='total_pts')
# display('TOtal pts per month is',total_pts_per_month)
# merge total pts per month with drug_counts per month
type_counts= (type_counts
.groupby('date_anx')
['drug_name'].value_counts()
.reset_index(name='drug_count')
.merge(total_pts_per_month)
)
# display('Type counts is: ',type_counts)
# get pct per month
type_counts['pct'] = 100 * type_counts['drug_count'] / type_counts.groupby('date_anx')['drug_count'].transform('sum')
#filter out no_drug
# type_counts = type_counts[type_counts['drug_name'].ne('no_drug')]
# add in drug class cols
type_counts['drug_class'] = type_counts['drug_name'].replace(class_name)
type_counts['drug_subclass'] = type_counts['drug_name'].replace(subclass_name)
#plot
drug_admin_yearly_fig = px.area(type_counts,
x='date_anx',
y='pct',
color='drug_class',
line_group='drug_name',
pattern_shape='drug_name',
title=f"""
Proportion drugs administered (yearly) to patients diagnosed with anxiety over time
<br><sup>'no_drug' refers to pts diagnosed with F41x on a day without drug administration</sup>
<br><sup>Colour corresponds drug name</sup>
""")
# changing orientation of the legend
drug_admin_yearly_fig.update_layout(legend=dict(
orientation="h",
))
drug_admin_time_line_fig = px.line(
type_counts,
x='date_anx',
y='pct',
color='drug_name',
# trendline='lowess',
title=f"""
Proportion drugs administered to patients diagnosed with anxiety over time
<br><sup>'no_drug' refers to pts diagnosed with F41x on a day without drug administration</sup>
<br><sup>Colour corresponds drug name</sup>
"""
)
drug_admin_time_line_fig.show()
#show df
# type_counts=type_counts.sort_values(by=['date_anx','drug_name']).set_index(['date_anx','drug_name'])
drug_admin_yearly_fig.show()
display(type_counts.head())
| date_anx | drug_name | drug_count | total_pts | pct | drug_class | drug_subclass | |
|---|---|---|---|---|---|---|---|
| 0 | ####-##-## | no_drug | 32 | 35 | 91.428571 | no_drug | no_drug |
| 1 | ####-##-## | DIAZEPAM | 1 | 35 | 2.857143 | Hypnotics and anxiolytics | Hypnotics |
| 2 | ####-##-## | MELATONIN | 1 | 35 | 2.857143 | Hypnotics and anxiolytics | Hypnotics |
| 3 | ####-##-## | SERTRALINE | 1 | 35 | 2.857143 | Antidepressants | Selective serotonin re-uptake inhibitors |
| 4 | ####-##-## | no_drug | 81 | 82 | 98.780488 | no_drug | no_drug |
drug_time = adm_diag.copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########','########','########']]
# drug_time = drug_time[drug_time['project_id'].isin(test_pts)]
# adm_df = adm_df[adm_df['project_id'].isin(test_pts)]
# drug_type = 'drug_name'
######## TEST ###########
drug_class_type = ['drug_name','drug_class','drug_subclass'] # list for for loop
# filter after April 2019
drug_time = drug_time[
(drug_time['date_anx'] > pd.to_datetime('2019-04-01'))
]
drug_time_figs = []
for drug_type in drug_class_type:
# group datetime col into monthly
type_counts = (
drug_time.groupby(['project_id',pd.Grouper(key='date_anx',freq='M')])
[drug_type].unique()
.explode()
.reset_index(name=drug_type)
.sort_values(by=['date_anx','project_id',drug_type])
)
#TEST - shows that when grouped monthly, there are entries where the same patient received a drug and also didn't
# type_counts = type_counts[type_counts['date_anx'].isin(['2019-04-30'])].sort_values(['date_anx','project_id',drug_type])
# display(
# type_counts
# # .drop_duplicates('project_id')
# # .sort_values(by=['drug_name'])
# .reset_index(drop=True)
# )
# get total num patients in month
total_pts_per_month = type_counts.groupby(['date_anx'])['project_id'].count().reset_index(name='total_pts')
# display('TOtal pts per month is',total_pts_per_month)
# merge total pts per month with drug_counts per month
type_counts= (type_counts
.groupby('date_anx')
[drug_type].value_counts()
.reset_index(name='drug_count')
.merge(total_pts_per_month)
)
# display('Type counts is: ',type_counts)
# get pct per month
type_counts['pct'] = 100 * type_counts['drug_count'] / type_counts.groupby('date_anx')['drug_count'].transform('sum')
#filter out no_drug
# type_counts = type_counts[type_counts[drug_type].ne('no_drug')]
#plot
fig = px.bar(type_counts,
x='date_anx',
y='pct',
color=drug_type,
title=f"""
Proportion drugs administered to patients diagnosed with anxiety over time ({drug_type})
<br><sup>'no_drug' refers to pts diagnosed with F41x on a day without drug administration</sup>
""")
#show df
type_counts=type_counts.sort_values(by=['date_anx',drug_type]).set_index(['date_anx',drug_type])
# add fig and df to list
drug_time_figs.append((fig,type_counts))
for fig in drug_time_figs:
fig[0].show()
display(fig[1].head(10))
| drug_count | total_pts | pct | ||
|---|---|---|---|---|
| date_anx | drug_name | |||
| 2019-04-30 | DIAZEPAM | 1 | 35 | 2.857143 |
| MELATONIN | 1 | 35 | 2.857143 | |
| SERTRALINE | 1 | 35 | 2.857143 | |
| no_drug | 32 | 35 | 91.428571 | |
| 2019-05-31 | SERTRALINE | 1 | 82 | 1.219512 |
| no_drug | 81 | 82 | 98.780488 | |
| 2019-06-30 | CHLORAL HYDRATE | 1 | 111 | 0.900901 |
| CLOBAZAM | 1 | 111 | 0.900901 | |
| DIAZEPAM | 1 | 111 | 0.900901 | |
| FLUOXETINE | 1 | 111 | 0.900901 |
| drug_count | total_pts | pct | ||
|---|---|---|---|---|
| date_anx | drug_class | |||
| 2019-04-30 | Antidepressants | 1 | 35 | 2.857143 |
| Hypnotics and anxiolytics | 2 | 35 | 5.714286 | |
| no_drug | 32 | 35 | 91.428571 | |
| 2019-05-31 | Antidepressants | 1 | 82 | 1.219512 |
| no_drug | 81 | 82 | 98.780488 | |
| 2019-06-30 | Antidepressants | 2 | 109 | 1.834862 |
| Hypnotics and anxiolytics | 5 | 109 | 4.587156 | |
| no_drug | 102 | 109 | 93.577982 | |
| 2019-07-31 | Antidepressants | 2 | 126 | 1.587302 |
| Hypnotics and anxiolytics | 2 | 126 | 1.587302 |
| drug_count | total_pts | pct | ||
|---|---|---|---|---|
| date_anx | drug_subclass | |||
| 2019-04-30 | Hypnotics | 2 | 35 | 5.714286 |
| Selective serotonin re-uptake inhibitors | 1 | 35 | 2.857143 | |
| no_drug | 32 | 35 | 91.428571 | |
| 2019-05-31 | Selective serotonin re-uptake inhibitors | 1 | 82 | 1.219512 |
| no_drug | 81 | 82 | 98.780488 | |
| 2019-06-30 | Anxiolytics | 1 | 109 | 0.917431 |
| Hypnotics | 4 | 109 | 3.669725 | |
| Selective serotonin re-uptake inhibitors | 2 | 109 | 1.834862 | |
| no_drug | 102 | 109 | 93.577982 | |
| 2019-07-31 | Anxiolytics | 1 | 127 | 0.787402 |
multiple_anx = df_main.copy()
# DROP duplicates
display(f"Length of dataframe before dropping duplicates = {multiple_anx.shape[0]}")
multiple_anx = multiple_anx.drop_duplicates(['project_id','diag_local_code'])
display(f"Length of dataframe before dropping duplicates = {multiple_anx.shape[0]}")
multiple_anx = multiple_anx[multiple_anx['diag_level_3']=='F41']
display(multiple_anx.groupby(['project_id'])['diag_local_code'].value_counts().reset_index(name='x').diag_local_code.value_counts())
multiple_anx = multiple_anx.groupby(['project_id'])['diag_level_3'].count().reset_index(name='F41X_count')
display( multiple_anx.head())
multiple_anx_counts = multiple_anx['F41X_count'].value_counts().reset_index(name='counts')
multiple_anx_counts.columns = ['F41X_number','count']
multiple_anx_counts['pct'] = multiple_anx_counts['count'] / multiple_anx_counts['count'].sum()*100
display(multiple_anx_counts)
fig = px.bar(multiple_anx_counts,
x='F41X_number',
y='pct',
title = 'Proportions of multiple F41X counts <em>any combination</em><br><sub>Removed duplicate diagnoses across same patient</sub>')
fig.show()
'Length of dataframe before dropping duplicates = 309989'
'Length of dataframe before dropping duplicates = 32273'
F419 1070 F418 584 F410 46 F411 32 F412 9 Name: diag_local_code, dtype: int64
| project_id | F41X_count | |
|---|---|---|
| 0 | PR-######## | 1 |
| 1 | PR-######## | 1 |
| 2 | PR-######## | 1 |
| 3 | PR-######## | 2 |
| 4 | PR-######## | 1 |
| F41X_number | count | pct | |
|---|---|---|---|
| 0 | 1 | #### | 89.510490 |
| 1 | 2 | 162 | 10.298792 |
| 2 | 3 | 3 | 0.190718 |
# FIND FREQUENCY OF REFERRING SPECIALTIES
# merge admissions df
diag_adm_df = df_main.merge(admissions_df, on='encounter_key')
# GET LIST OF UNIQUE SPECIALTIES, ASSIGN EACH A COLOUR
diag_adm_df_filtered = diag_adm_df.copy()
specialty_freq = diag_adm_df_filtered['hospital_service'].value_counts().reset_index(name='Count').rename(columns={'index':'Specialty'})
list_specialties = list(specialty_freq['Specialty'].unique())
list_colours = px.colors.qualitative.Dark24 + px.colors.qualitative.Light24
speciality_colours = dict(zip(list_specialties, list_colours))
for code in ANXIETY_CODES:
diag_adm_df_filtered = diag_adm_df[diag_adm_df['diag_local_code']==code]
print(diag_adm_df_filtered.shape)
if diag_adm_df_filtered.shape[0]:
specialty_freq = diag_adm_df_filtered['hospital_service'].value_counts().reset_index(name='Count').rename(columns={'index':'Specialty'}).head(10)
specialty_freq['pct'] = specialty_freq['Count'] / specialty_freq['Count'].sum() *100
specialty_freq['speciality_colour'] = specialty_freq.apply(lambda x: speciality_colours[x['Specialty']], axis=1)
display(specialty_freq)
# fig = go.Figure(data = [go.Bar(
# x = specialty_freq['Specialty'],
# y = specialty_freq['pct'],
# # title=f'Proportion speciality referrals for {code}',
# # color='Specialty',
# marker_color = specialty_freq['speciality_colour']
# )])
fig = px.bar(specialty_freq,
x='Specialty',
y='pct',
title=f'Proportion speciality referrals for {code}',
color='Specialty',
color_discrete_sequence = specialty_freq['speciality_colour']
)
fig.show()
else:
print(f'No results found for {code}')
(3447, 13)
| Specialty | Count | pct | speciality_colour | |
|---|---|---|---|---|
| 0 | Neurology | 297 | 13.384407 | #EB663B |
| 1 | Gastroenterology | 292 | 13.159081 | #1CA71C |
| 2 | Nephrology | 274 | 12.347904 | #2E91E5 |
| 3 | Rheumatology | 252 | 11.356467 | #B68100 |
| 4 | Dermatology | 227 | 10.229833 | #B2828D |
| 5 | SNAPS | 194 | 8.742677 | #DA16FF |
| 6 | Haematology | 191 | 8.607481 | #FB0D0D |
| 7 | Oncology | 170 | 7.661109 | #E15F99 |
| 8 | Cardiology | 161 | 7.255521 | #778AAE |
| 9 | Urology | 161 | 7.255521 | #222A2A |
(3029, 13)
| Specialty | Count | pct | speciality_colour | |
|---|---|---|---|---|
| 0 | Haematology | 323 | 14.796152 | #FB0D0D |
| 1 | Oncology | 291 | 13.330279 | #E15F99 |
| 2 | Gastroenterology | 288 | 13.192854 | #1CA71C |
| 3 | Nephrology | 270 | 12.368301 | #2E91E5 |
| 4 | Rheumatology | 211 | 9.665598 | #B68100 |
| 5 | Urology | 202 | 9.253321 | #222A2A |
| 6 | SNAPS | 187 | 8.566193 | #DA16FF |
| 7 | Neurosurgery | 149 | 6.825470 | #750D86 |
| 8 | Dermatology | 135 | 6.184150 | #B2828D |
| 9 | Ear Nose and Throat | 127 | 5.817682 | #FB00D1 |
(168, 13)
| Specialty | Count | pct | speciality_colour | |
|---|---|---|---|---|
| 0 | Rheumatology | 46 | 32.167832 | #B68100 |
| 1 | Neuromuscular | 26 | 18.181818 | #DA60CA |
| 2 | SNAPS | 16 | 11.188811 | #DA16FF |
| 3 | Respiratory Medicine | 12 | 8.391608 | #511CFB |
| 4 | Cardiology | 10 | 6.993007 | #778AAE |
| 5 | Neurosurgery | 8 | 5.594406 | #750D86 |
| 6 | Urology | 7 | 4.895105 | #222A2A |
| 7 | Gastroenterology | 6 | 4.195804 | #1CA71C |
| 8 | Neurology | 6 | 4.195804 | #EB663B |
| 9 | Ear Nose and Throat | 6 | 4.195804 | #FB00D1 |
(143, 13)
| Specialty | Count | pct | speciality_colour | |
|---|---|---|---|---|
| 0 | Bone Marrow Transplant | 30 | 28.037383 | #6C4516 |
| 1 | Cardiology | 10 | 9.345794 | #778AAE |
| 2 | Rheumatology | 10 | 9.345794 | #B68100 |
| 3 | Dermatology | 10 | 9.345794 | #B2828D |
| 4 | Neurology | 9 | 8.411215 | #EB663B |
| 5 | Metabolic Medicine | 8 | 7.476636 | #00A08B |
| 6 | Cardiothoracic Transplantation | 8 | 7.476636 | #6C7C32 |
| 7 | Gastroenterology | 8 | 7.476636 | #1CA71C |
| 8 | Haematology | 8 | 7.476636 | #FB0D0D |
| 9 | Orthopaedics | 6 | 5.607477 | #A777F1 |
(32, 13)
| Specialty | Count | pct | speciality_colour | |
|---|---|---|---|---|
| 0 | Neurology | 12 | 37.500 | #EB663B |
| 1 | Neurosurgery | 5 | 15.625 | #750D86 |
| 2 | Urology | 4 | 12.500 | #222A2A |
| 3 | CAMHS | 3 | 9.375 | #479B55 |
| 4 | Epilepsy | 2 | 6.250 | #FD3216 |
| 5 | Immunology | 2 | 6.250 | #862A16 |
| 6 | Gastroenterology | 2 | 6.250 | #1CA71C |
| 7 | Plastic Surgery | 2 | 6.250 | #AF0038 |
temp_diag = diag_df.copy()
proc_anx = proc_df[proc_df['proc_code_set']=='OPCS-4'][[
'project_id', 'start_datetime', 'proc_name','proc_local_code',
]].copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########']]
# temp_diag = temp_diag[temp_diag['project_id'].isin(test_pts)]
# proc_anx = proc_anx[proc_anx['project_id'].isin(test_pts)]
######## TEST ###########
# display(temp_diag.head())
# filter after April 2019
temp_diag = temp_diag[temp_diag['date_anx'] > pd.to_datetime('2019-04-01')]
display('anx diag before merge', diag_df.head(), diag_df.shape)
# change procs datetime heading for mereg
proc_anx = proc_anx.rename(columns={'start_datetime':'date_anx'})
display('procs before merge', proc_anx.head(), proc_anx.shape)
# DO LEFT MERGE
proc_diag = temp_diag.merge(
proc_anx,
how='left',
)
display(f'Df after left merge {proc_diag.shape}. Keeping all days where anxiety diagnosed',proc_diag.head())
print('Replacing na...')
# replace na
proc_diag = proc_diag.fillna('no_proc').reset_index(drop=True)
print('Done!')
# add month, year cols
proc_diag['month'] = proc_diag['date_anx'].dt.strftime('%b')
proc_diag['year'] = proc_diag['date_anx'].dt.strftime('%Y')
display(f'Final df, left merge, convert NaN, add month, year cols {proc_diag.shape}',proc_diag.head())
# display(f'Final df, outer merge, dropped NaN {adm_diag.shape}',adm_diag)
'anx diag before merge'
| project_id | date_anx | |
|---|---|---|
| 0 | PR-######## | ####-##-## |
| 1 | PR-######## | ####-##-## |
| 2 | PR-######## | ####-##-## |
| 3 | PR-######## | ####-##-## |
| 4 | PR-######## | ####-##-## |
(1999393, 2)
'procs before merge'
| project_id | date_anx | proc_name | proc_local_code | |
|---|---|---|---|---|
| 12 | PR-######## | ####-##-## | Removal of central venous catheter | L914 |
| 13 | PR-######## | ####-##-## | Specified vein of upper body | Z918 |
| 14 | PR-######## | ####-##-## | Right sided operation | Z942 |
| 24 | PR-######## | ####-##-## | Other specified : Other operations on bladder | M498 |
| 25 | PR-######## | ####-##-## | Approach to organ through cystostomy | Y523 |
(43228, 4)
'Df after left merge (1915808, 4). Keeping all days where anxiety diagnosed'
| project_id | date_anx | proc_name | proc_local_code | |
|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | NaN | NaN |
| 1 | PR-######## | ####-##-## | NaN | NaN |
| 2 | PR-######## | ####-##-## | Removal of central venous catheter | L914 |
| 3 | PR-######## | ####-##-## | Specified vein of upper body | Z918 |
| 4 | PR-######## | ####-##-## | Right sided operation | Z942 |
Replacing na... Done!
'Final df, left merge, convert NaN, add month, year cols (1915808, 6)'
| project_id | date_anx | proc_name | proc_local_code | month | year | |
|---|---|---|---|---|---|---|
| 0 | PR-######## | ####-##-## | no_proc | no_proc | Apr | #### |
| 1 | PR-######## | ####-##-## | no_proc | no_proc | Apr | #### |
| 2 | PR-######## | ####-##-## | Removal of central venous catheter | L914 | Apr | #### |
| 3 | PR-######## | ####-##-## | Specified vein of upper body | Z918 | Apr | #### |
| 4 | PR-######## | ####-##-## | Right sided operation | Z942 | Apr | #### |
# make copy
proc_diag_clean = proc_diag.copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########','########']]
# proc_diag_clean = proc_diag_clean[proc_diag_clean['project_id'].isin(test_pts)]
######## TEST ###########
# get proc_name and proc_local_code dict
proc_code_dict = dict(zip(proc_diag['proc_name'], proc_diag['proc_local_code']))
# add col for opcs-4 code
proc_diag_clean['proc_local_code'] = proc_diag_clean['proc_name'].replace(proc_code_dict)
# filter out x,y ,z
proc_diag_clean = (proc_diag_clean[
(~proc_diag_clean['proc_local_code'].str.startswith('Z'))
&
(~proc_diag_clean['proc_local_code'].str.startswith('Y'))
&
(~proc_diag_clean['proc_local_code'].str.startswith('X'))
])
most_common_procs = (
proc_diag_clean
.groupby(['project_id','date_anx'])
['proc_name'].unique().explode()
.value_counts()
.reset_index(name='count').rename(columns={'index':'proc_name'})
)
# get top 20 from procs count
top_20_procs = most_common_procs.head(20).copy()
# make local code col
top_20_procs['proc_local_code'] = top_20_procs['proc_name'].replace(proc_code_dict)
# make new col of proc name and code
top_20_procs['proc_name_code'] = top_20_procs['proc_name'] + " (" + top_20_procs['proc_local_code'] + ")"
top_20_procs['proc_name_code'] = top_20_procs['proc_name_code'].fillna('no_proc')
# get pct
top_20_procs['pct'] = 100 * top_20_procs['count'] / top_20_procs['count'].sum()
# fig
top_20_procs_fig = px.bar(top_20_procs.head(10),
x='proc_name_code',
y='pct',
color='proc_name_code',
title='Top 10 procedures for which anxiety (F41x) was diagnosed on the same day<br><sup>Filtered for just OPCS-4 codes</sup>'
)
top_20_procs_fig.show()
display(top_20_procs)
| proc_name | count | proc_local_code | proc_name_code | pct | |
|---|---|---|---|---|---|
| 0 | no_proc | #### | no_proc | no_proc (no_proc) | 98.107300 |
| 1 | Transthoracic echocardiography | 286 | U201 | Transthoracic echocardiography (U201) | 0.288854 |
| 2 | Injection of therapeutic substance into cerebr... | 175 | A542 | Injection of therapeutic substance into cerebr... | 0.176746 |
| 3 | Magnetic resonance imaging NEC | 150 | U211 | Magnetic resonance imaging NEC (U211) | 0.151497 |
| 4 | Other specified : Other operations on connecti... | 141 | A148 | Other specified : Other operations on connecti... | 0.142407 |
| 5 | Other specified rehabilitation for musculoskel... | 96 | U508 | Other specified rehabilitation for musculoskel... | 0.096958 |
| 6 | Magnetic resonance imaging of head | 96 | U052 | Magnetic resonance imaging of head (U052) | 0.096958 |
| 7 | Injection of therapeutic substance around spin... | 80 | A577 | Injection of therapeutic substance around spin... | 0.080798 |
| 8 | Uroflowmetry NEC | 80 | U262 | Uroflowmetry NEC (U262) | 0.080798 |
| 9 | Ultrasound scan NEC | 77 | U216 | Ultrasound scan NEC (U216) | 0.077768 |
| 10 | Polysomnography | 76 | U331 | Polysomnography (U331) | 0.076758 |
| 11 | Laser destruction of lesion of skin of head or... | 76 | S091 | Laser destruction of lesion of skin of head or... | 0.076758 |
| 12 | Ultrasound of bladder | 75 | U124 | Ultrasound of bladder (U124) | 0.075748 |
| 13 | Urodynamics NEC | 74 | U264 | Urodynamics NEC (U264) | 0.074738 |
| 14 | Pelvis NEC | 74 | O161 | Pelvis NEC (O161) | 0.074738 |
| 15 | Fibreoptic endoscopic examination of upper gas... | 73 | G451 | Fibreoptic endoscopic examination of upper gas... | 0.073728 |
| 16 | Removal of central venous catheter | 66 | L914 | Removal of central venous catheter (L914) | 0.066659 |
| 17 | Invasive ventilation | 62 | E851 | Invasive ventilation (E851) | 0.062619 |
| 18 | Plain X-ray NEC | 59 | U217 | Plain X-ray NEC (U217) | 0.059589 |
| 19 | Cardiac magnetic resonance imaging | 58 | U103 | Cardiac magnetic resonance imaging (U103) | 0.058579 |
# make copy
proc_over_time = proc_diag.copy()
######## TEST ###########
# test_pts = ['PR-'+num for num in ['########','########','########','########']]
# proc_over_time = proc_over_time[proc_over_time['project_id'].isin(test_pts)]
# proc_level = 'proc_name'
######## TEST ###########
# get only procedures in top 20 list
proc_over_time = proc_over_time[proc_over_time['proc_name'].isin(top_20_procs.proc_name)]
# group datetime col into monthly
proc_counts = (
proc_over_time.groupby(['project_id',pd.Grouper(key='date_anx',freq='M')])
['proc_name'].unique()
.explode()
.reset_index(name='proc_name')
.sort_values(by=['date_anx','project_id','proc_name'])
)
# get total num patients in month
total_pts_per_month = proc_counts.groupby(['date_anx'])['project_id'].count().reset_index(name='total_pts')
# display('TOtal pts per month is',total_pts_per_month)
# merge total pts per month with drug_counts per month
proc_counts= (proc_counts
.groupby('date_anx')
['proc_name'].value_counts()
.reset_index(name='proc_count')
.merge(total_pts_per_month)
)
# display('Proc counts is: ',proc_counts)
# get pct per month
proc_counts['pct'] = 100 * proc_counts['proc_count'] / proc_counts.groupby('date_anx')['proc_count'].transform('sum')
# make proc local code dict
proc_counts['proc_local_code'] = proc_counts['proc_name'].replace(proc_code_dict)
# ADD COLUMNS FOR DIAGNOSIS HIERARCHIES
proc_counts['proc_level_1'] = proc_counts['proc_local_code'].str[0]
proc_counts['proc_level_2'] = proc_counts['proc_local_code'].str[0:2]
proc_counts['proc_level_3'] = proc_counts['proc_local_code'].str[0:3]
#plot
procs_fig = px.area(proc_counts,
x='date_anx',
y='pct',
color='proc_level_1',
line_group='proc_name',
pattern_shape='proc_name',
title=f"""
Same day procedures for patients diagnosed with anxiety over time
<br><sup>'no_proc' refers to pts diagnosed with F41x on a day without any recorded procedure</sup>
<br><sup>Colour corresponds procedure OPCS-4 top-level code</sup>
"""
)
# changing orientation of the legend
procs_fig.update_layout(legend=dict(
orientation="h",
y=-0.5,
x=0,
))
procs_fig.show()
display(proc_counts.head(30).set_index(['date_anx','proc_name','pct']))
| proc_count | total_pts | proc_local_code | proc_level_1 | proc_level_2 | proc_level_3 | |||
|---|---|---|---|---|---|---|---|---|
| date_anx | proc_name | pct | ||||||
| 2019-04-30 | no_proc | 84.210526 | 32 | 38 | no_proc | n | no | no_ |
| Cardiac magnetic resonance imaging | 2.631579 | 1 | 38 | U103 | U | U1 | U10 | |
| Invasive ventilation | 2.631579 | 1 | 38 | E851 | E | E8 | E85 | |
| Other specified : Other operations on connection from ventricle of brain | 2.631579 | 1 | 38 | A148 | A | A1 | A14 | |
| Polysomnography | 2.631579 | 1 | 38 | U331 | U | U3 | U33 | |
| Removal of central venous catheter | 2.631579 | 1 | 38 | L914 | L | L9 | L91 | |
| Transthoracic echocardiography | 2.631579 | 1 | 38 | U201 | U | U2 | U20 | |
| 2019-05-31 | no_proc | 79.000000 | 79 | 100 | no_proc | n | no | no_ |
| Transthoracic echocardiography | 5.000000 | 5 | 100 | U201 | U | U2 | U20 | |
| Polysomnography | 3.000000 | 3 | 100 | U331 | U | U3 | U33 | |
| Injection of therapeutic substance into cerebrospinal fluid | 2.000000 | 2 | 100 | A542 | A | A5 | A54 | |
| Laser destruction of lesion of skin of head or neck | 2.000000 | 2 | 100 | S091 | S | S0 | S09 | |
| Other specified rehabilitation for musculoskeletal disorders | 2.000000 | 2 | 100 | U508 | U | U5 | U50 | |
| Ultrasound scan NEC | 2.000000 | 2 | 100 | U216 | U | U2 | U21 | |
| Invasive ventilation | 1.000000 | 1 | 100 | E851 | E | E8 | E85 | |
| Other specified : Other operations on connection from ventricle of brain | 1.000000 | 1 | 100 | A148 | A | A1 | A14 | |
| Pelvis NEC | 1.000000 | 1 | 100 | O161 | O | O1 | O16 | |
| Removal of central venous catheter | 1.000000 | 1 | 100 | L914 | L | L9 | L91 | |
| Uroflowmetry NEC | 1.000000 | 1 | 100 | U262 | U | U2 | U26 | |
| 2019-06-30 | no_proc | 79.687500 | 102 | 128 | no_proc | n | no | no_ |
| Transthoracic echocardiography | 7.031250 | 9 | 128 | U201 | U | U2 | U20 | |
| Cardiac magnetic resonance imaging | 2.343750 | 3 | 128 | U103 | U | U1 | U10 | |
| Magnetic resonance imaging of head | 2.343750 | 3 | 128 | U052 | U | U0 | U05 | |
| Injection of therapeutic substance around spinal nerve root | 1.562500 | 2 | 128 | A577 | A | A5 | A57 | |
| Plain X-ray NEC | 1.562500 | 2 | 128 | U217 | U | U2 | U21 | |
| Polysomnography | 1.562500 | 2 | 128 | U331 | U | U3 | U33 | |
| Injection of therapeutic substance into cerebrospinal fluid | 0.781250 | 1 | 128 | A542 | A | A5 | A54 | |
| Invasive ventilation | 0.781250 | 1 | 128 | E851 | E | E8 | E85 | |
| Laser destruction of lesion of skin of head or neck | 0.781250 | 1 | 128 | S091 | S | S0 | S09 | |
| Other specified : Other operations on connection from ventricle of brain | 0.781250 | 1 | 128 | A148 | A | A1 | A14 |
# MAKE DF FOR LINE GRAPH
proc_counts_line = proc_counts.copy()
proc_counts_line = proc_counts_line.groupby(['date_anx','proc_level_1'])['pct'].sum().reset_index(name='pct').sort_values(by=['date_anx','pct'])
#plot
procs_fig_line = px.line(proc_counts_line,
x='date_anx',
y='pct',
color='proc_level_1',
title=f"""
Same day procedures for patients diagnosed with anxiety over time
<br><sup>'no_proc' refers to pts diagnosed with F41x on a day without any recorded procedure</sup>
""")
# changing orientation of the legend
# procs_fig_line.update_layout(legend=dict(
# orientation="h",
# y=-0.5,
# x=0,
# ))
procs_fig_line.show()
display(proc_counts_line.head())
| date_anx | proc_level_1 | pct | |
|---|---|---|---|
| 0 | ####-##-## | A | 2.631579 |
| 1 | ####-##-## | E | 2.631579 |
| 2 | ####-##-## | L | 2.631579 |
| 3 | ####-##-## | U | 7.894737 |
| 4 | ####-##-## | n | 84.210526 |
def clean_graph(
original_fig,
xaxis_title = None,
yaxis_title = None,
title=None,
legend_title=None,
legend_pos_x_y = (False,False),
opacity = 1,
width=900,
height=500
):
fig= go.Figure(original_fig)
fig.update_layout(
#remove bg
paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)',
# update titles
title = title,
xaxis_title = xaxis_title,
yaxis_title = yaxis_title,
legend_title = legend_title,
#update sizing
margin=dict(l=20, r=0, t=100, b=20),
width=width,
height=height,
)
# update legend position
if legend_pos_x_y[0] and legend_pos_x_y[1]:
fig.update_layout(
legend=dict(
yanchor="top",
y=legend_pos_x_y[1],
xanchor="left",
x=legend_pos_x_y[0]
)
)
# decrease marker opacity
fig.update_traces(
opacity = opacity
)
return fig
display(md(f"""
# Exploration of Paediatric Patients Diagnosed with Anxiety
## *by Anchit Chandran*
<sup> Disclaimer: this is an internal report within which areas of discussion are not backed by evidence-based propositions.</sup>
**This report details the insights gained from an exploratory data analysis of GOSH patients diagnosed with anxiety, according to the ICD-10 definition.**
It covers 3 main sections:
> 1. Anxiety diagnoses
> 2. Medication administrations for patients with anxiety
> 3. Procedures performed on patients with anxiety
The raw data (for patients diagnosed with ICD-10 anxiety code) had the following characteristics:
**Diagnoses**
- Description: *Data including the ICD-10 diagnosis code for patients on any given day.*
- Date range: *{str(df_main.start_datetime.min().date())}* until *{str(df_main.start_datetime.max().date())}*
- Number of entries: *{df_main.shape[0]:,} rows*
**Specialties**
- Description: *Data including the specialty referrals patients on any given day.*
- Date range: *{str(admissions_df.start_datetime.min().date())}* until *{str(admissions_df.start_datetime.max().date())}*
- Number of entries: *{admissions_df.shape[0]:,} rows*
**Medications**
- Description: *Data including the medications (filtered to only antidepressants, anxiolytics, and hypnotics) administered to patients on any given day.*
- Date range: *{str(admins_df.start_datetime.min().date())}* until *{str(admins_df.start_datetime.max().date())}*
- Number of entries: *{admins_df.shape[0]:,} rows*
**Procedures**
- Description: *Data including the procedures performed on patients for any given day.*
- Date range: *{str(proc_df.start_datetime.min().date())}* until *{str(proc_df.start_datetime.max().date())}*
- Number of entries: *{proc_df.shape[0]:,} rows*
"""))
display(md("""
## How is anxiety defined?
This analysis focussed on the [**F.41.x** anxiety codes](https://icd.who.int/browse10/2019/en#/F41) - *Other anxiety disorders*. These are defined as:
>*Disorders in which manifestation of anxiety is the major symptom and is not restricted to any particular environmental situation. Depressive and obsessional symptoms, and even some elements of phobic anxiety, may also be present, provided that they are clearly secondary or less severe.*
**Taking an overall snapshot of the entire data, the most common F.41 anxiety diagnoses were *Anxiety Disorder, unspecified* and *Other specified anxiety disorders*:**
"""))
clean_graph(fig_top_anx_diagnoses,
title = 'Frequency distribution of F.41 anxiety diagnoses',
xaxis_title = None,
yaxis_title = 'Number of diagnosis codes',
legend_title=None,
legend_pos_x_y = (0.85,0.95),
).show()
display(md("""
## How did the prevelance of F.41-type anxiety diagnoses change over time?
**The scatter plot below reveals a yearly variation in the monthly sum of discrete anxiety diagnoses, with peaks in Summer 2019 and 2021.**
"""))
clean_graph(fig_total_dx_over_time,
title = 'Variation in sum F.41 anxiety diagnoses over time<br><sup>UK Lockdown dates marked</sup>',
xaxis_title = None,
yaxis_title = 'Number of diagnosis codes',
legend_title=None,
legend_pos_x_y = (0.85,0.95),
).show()
# STRATIFIED BY ANX CODE
display(md(f"""
Once stratified by the lowest level of ICD-10s hierarchy, **"*Anxiety disorder, unspecified*" forms a decreasing proportion of the monthly sum anxiety diagnoses**.
"""))
clean_graph(fig_per_diag,
title = 'Variation in F.41 anxiety diagnoses over time, stratified by diagnosis',
xaxis_title = None,
yaxis_title = 'Number of diagnosis codes',
legend_title=None,
width=975,
).show()
display(md(f"""
When normalising for total patients diagnosed with "*Anxiety disorder, unspecified*" per month, a downwards trend is clearly exhibited, from a maximum of 93% in September 2019, to a minimum of 28% in September 2021.
Conversely, *Other specified anxiety disorders* diagnoses increases from 5% in September 2019 to a maximum of 66% in October 2021.
Overall, other codes within the F.41 group remain consistent.
"""))
clean_graph(fig_per_diag_pct,
title = 'Normalised variation in F.41 anxiety diagnoses over time, stratified by diagnosis',
xaxis_title = None,
yaxis_title = 'Proportion of total F.41 diagnoses (%)',
legend_title=None,
width=975,
).show()
display(md(f"""
### What could have caused this change in diagnosis prevelance?
Multiple factors are likely involved, of which COVID-19 may have played a significant role.
The first observed trough occurs at the beginning of the pandemic, associated with the most cautionary measures.
Rather than a true decrease in anxiety, hospital policies combined with decreased capacity and resources, could have led to fewer psychiatric patient admissions.
Timed with the advances in *telehealth*, this decrease in prevalence does not necessitate negative health outcomes.
Holistic treatment frameworks, such as the Bio-Psycho-Social model, are more crucial in psychiatric illness, when compared with biological illness. When biological treatment capacity is forcibly decreased, advances in the psycho-social aspects may have been seen.
The following peak of anxiety diagnoses in early 2021 could be due to a rebound of untreated patients, alongside the toll of lockdown on mental health.
However, this increase may also be seen because of mental health education and treatment de-stigmatisation.
Ultimately, further research is required to investigate outcomes for psychiatric patients.
### Limitations
To form an comphrensive understanding of this observed pattern's causality, the limitations must be considered.
Measurement error is one such imperative factor.
Psychiatric conditions are inherently subjective, leading to a lack of reliable tools for standardised diagnosis. For the same patient, on the same day, the perception of anxiety may differ both between different clinicians, and the same clinician at different times.
For these paediatric patients experiencing pain and discomfort in uncertain environments, the ICD-10 codes may be unable to perfectly capture the turbulent black-box of their minds.
"""))
display(md("""
## How long are patients diagnosed with anxiety?
This section required the greatest amount of data wrangling and cleaning. Certain assumptions were required for analysis.
Initial exploration found numerous patients, for whom a period of consecutive daily anxiety diagnoses would include random single days where a patient did not receive an anxiety diagnosis.
For example, if a patient had 100 days of consecutive anxiety, Day 50 may not have an anxiety code recorded. In analysis, this would be identified as two discrete periods of anxiety lasting 49 days and 50 days respectively, separated by an anomalous anxiety-day, rather than an extended period of 100 days.
It was assumed this was likely a mistake in clinical diagnosis coding, and so, the following algorithm was used to find and fill in missing diagnoses:
1. The period of anxiety must exist in 1 patient.
2. The missing day(s) must have anxiety diagnosed before **and** after.
3. The total number of missed day(s) must be ? 2 days.
4. *If all 3 True* ? mark in the missing day(s) with an anxiety diagnosis.
**This data was then plotted using a histogram, showing the frequencies of different durations of anxiety.**
"""))
clean_graph(freq_dist_fig_anx_length,
title = 'Log frequency distribution of anxiety durations',
xaxis_title = 'Anxiety Duration (days)',
yaxis_title = 'Log frequency',
legend_title=None,
).show()
display(md("""
## What other diseases do these patients have?
Disease codes, where an F.41 anxiety code was also present on the same day, were summed to investigate the patients comorbidities.
*N.B. F.41 codes were filtered out of the count*.
*N.B. R, Z codes were filtered out as they were "other"/"unspecified" codes, and had a signficantly greater sum.*
"""))
clean_graph(comborb_fig_diag_name,
title = 'Top 20 most common co-morbidities associated with an F.41 anxiety code',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
width=1200
).show()
clean_graph(comborb_fig_level_1,
title = 'Top 20 most common co-morbidities associated with an F.41 anxiety code<br><sup>Grouped by ICD-10\'s highest level hierarchy.',
xaxis_title = 'ICD-10 Code',
yaxis_title = '%',
legend_title=None,
width=1200,
).show()
display(md("""
At the most granular level of ICD-10s hierarchy, the top 3 co-diagnosed disease codes were:
1. Chronic Kidney Disease *(Stage V)*
2. Childhood Autism
3. Chronic Kidney Disease *(unspecified)*
When looking at the highest level, the most common codes were:
1. **Q** - *Congenital malformations, deformations and chromosomal abnormalities*
2. **K** - *Diseases of the digestive system*
3. **F** - *Mental and behavioural disorders*
"""))
display(md("""
## What medications are being given to patients with anxiety?
An overall sum frequency of each medications administration shows that **Melatonin** is by far the most commonly administered medication.
"""))
clean_graph(overall_meds_fig,
title = 'Frequency distribution of anxiolytic and antidepressant administrations',
xaxis_title = None,
yaxis_title = 'Total administrations (n)',
legend_title=None,
# width=1200
).show()
display(md("""
[According to the BNFc](https://bnfc.nice.org.uk/drug/melatonin.html), the indications for Melatonin relate to insomnia, particularly in the following patient groups:
- Patients with learning disabilities and behavioural challenges
- Patients with autism spectrum disorder
- Patients with [Smith Magenis Syndrome](https://rarediseases.org/rare-diseases/smith-magenis-syndrome/) - a rare developmental disorder
These indications fit the most common comorbidities, alongside anxiety itself leading to sleep-related behavioural difficulties, and could provide a possible explanation towards Melatonins popular administration.
"""))
display(md("""
### How does medication use change over time?
Below is a line chart showing the monthly percentage of HAA medications administered on the same day as a patient who was diagnosed with anxiety.
**Most patients diagnosed with anxiety were not administered any of the HAA medications.**
"""))
clean_graph(drug_admin_time_line_fig,
title = 'Proportional variation over time in medication administration to F.41 anxiety patients',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
# width=1200,
).show()
display(md("""
**When stratified by *drug class*, Hypnotics and Anxiolytics administrations are far greater than Antidepressants. Both classes appear to remain at a stable occurrence over time.**
"""))
clean_graph(drug_admin_yearly_fig,
title = 'Proportional variation over time in medication administration to F.41 anxiety patients<br><sup>Grouped by drug class</sup>',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
# width=1200,
).show()
display(md("""
The difference in frequency could be explained by the respective benefit : side-effect ratio.
For example, Melatonin is included within the Hypnotics and Anxiolytics class, which can be used in a diverse patient population, and holds few side-effects. Conversely, SSRIs such as Fluoxetine, rarely tend to prescribed in children, only for severe CAMHS disorders, and must be used with great care.
"""))
display(md("""
## Which procedures are performed on patients with anxiety?
This analysis focussed only on recorded procedure tagged according to the OPCS-4 structure.
According to this hierarchy, codes X,Y,Z were deemed to be other codes: those which were ambiguous and non-specific, creating inaccurate duplicates for a single procedure.
For example, if a nephrectomy is performed, this may be tagged under the M.03 class (*Partial excision of kidney*) as well as Z.94.2 (*Right sided operation*).
Therefore, the X, Y, Z codes were filtered out of the anaylsis.
### What are the most common procedures performed on patients diagnosed with anxiety?
**Procedures were rarely performed on the days patients were diagnosed with anxiety**, according to this analysis.
When simply looking for the most common procedures, this analysis found the top 3 to be:
1. Transthoracic echocardiography
2. Injection of therapeutic substance into CSF
3. MRI NEC (*Not elsewhere classified*)
"""))
clean_graph(top_20_procs_fig,
title = 'Frequency distribution of procedures<br><sup>Conducted on F.41 anxiety patients</sup>',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
width=1200,
height=550
).show()
display(md("""
### How did the frequency of procedures change over time?
Utilising the 20 most common procedures found previously, a line chart and area chart were plotted to examine trends over time. Each procedure was grouped into its top level OPCS-4 hierarchy.
The *U* class is shown to make up the highest proportion of procedures, and trends upwards. The OPCS-4 groups the U codes as *Diagnostic imaging, testing and rehabilitation*.
The other classes of procedures appear to remain relatively stable over time, with the next most common being *A* (Procedures related to the *Nervous System* e.g. open biopsies of the brain).
"""))
clean_graph(procs_fig_line,
title = 'Proportional variation over time of procedures performed on F.41 anxiety patients',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
# width=1200,
).show()
clean_graph(procs_fig,
title = 'Proportional variation over time of procedures performed on F.41 anxiety patients<br><sup>Grouped by procedure</sup>',
xaxis_title = None,
yaxis_title = '%',
legend_title=None,
# width=1200,
).show()
display(md("""
# Further Work
This report was a preliminary exploration of two dimensions thought to be related to anxiety diagnoses: medication administrations and procedures.
There exist numerous avenues for development upon the initial findings of this report.
Currently, this data only looks at data from 2019 onwards. The primary reason for this relates to feasibility: earlier periods of data are messier and would require more time than allotted to explore.
Delving deeper into the findings, it is important to combine data across sections to reveal correlations.
Perhaps, there are particular procedures which increase perceived anxiety in a patient. As procedures are necessary, this anxiety may be treated with medications to acutely calm the patient. As the root-cause of anxiety has not been addressed, its duration will increase.
An alternative approach involves identifying anxiety-inducing procedures and focussing psychological resources to address the stress on childrens psyche, thereby demanding fewer medications, leading to improved patient outcomes and cost-effectiveness.
Machine learning methods can sometimes be used to improve clinical outcomes.
In this case, an appropriate example is an unsupervised clustering algorithm which identifies different groups of patients based on dimensions such as patient demographics, medications and procedures. Clinical early warning systems could utilise clusters to identify medication efficacy for patients with anxiety. More, they could warn against procedures which worsen anxiety duration, and long-term clinical outcomes.
"""))
Disclaimer: this is an internal report within which areas of discussion are not backed by evidence-based propositions.
This report details the insights gained from an exploratory data analysis of GOSH patients diagnosed with anxiety, according to the ICD-10 definition.
It covers 3 main sections:
- Anxiety diagnoses
- Medication administrations for patients with anxiety
- Procedures performed on patients with anxiety
The raw data (for patients diagnosed with ICD-10 anxiety code) had the following characteristics:
Diagnoses
Specialties
Medications
Procedures
This analysis focussed on the F.41.x anxiety codes - Other anxiety disorders. These are defined as:
Disorders in which manifestation of anxiety is the major symptom and is not restricted to any particular environmental situation. Depressive and obsessional symptoms, and even some elements of phobic anxiety, may also be present, provided that they are clearly secondary or less severe.
Taking an overall snapshot of the entire data, the most common F.41 anxiety diagnoses were Anxiety Disorder, unspecified and Other specified anxiety disorders:
The scatter plot below reveals a yearly variation in the monthly sum of discrete anxiety diagnoses, with peaks in Summer 2019 and 2021.
Once stratified by the lowest level of ICD-10s hierarchy, "Anxiety disorder, unspecified" forms a decreasing proportion of the monthly sum anxiety diagnoses.
When normalising for total patients diagnosed with "Anxiety disorder, unspecified" per month, a downwards trend is clearly exhibited, from a maximum of 93% in September 2019, to a minimum of 28% in September 2021.
Conversely, Other specified anxiety disorders diagnoses increases from 5% in September 2019 to a maximum of 66% in October 2021.
Overall, other codes within the F.41 group remain consistent.
Multiple factors are likely involved, of which COVID-19 may have played a significant role.
The first observed trough occurs at the beginning of the pandemic, associated with the most cautionary measures.
Rather than a true decrease in anxiety, hospital policies combined with decreased capacity and resources, could have led to fewer psychiatric patient admissions.
Timed with the advances in telehealth, this decrease in prevalence does not necessitate negative health outcomes.
Holistic treatment frameworks, such as the Bio-Psycho-Social model, are more crucial in psychiatric illness, when compared with biological illness. When biological treatment capacity is forcibly decreased, advances in the psycho-social aspects may have been seen.
The following peak of anxiety diagnoses in early 2021 could be due to a rebound of untreated patients, alongside the toll of lockdown on mental health.
However, this increase may also be seen because of mental health education and treatment de-stigmatisation.
Ultimately, further research is required to investigate outcomes for psychiatric patients.
To form an comphrensive understanding of this observed pattern's causality, the limitations must be considered.
Measurement error is one such imperative factor.
Psychiatric conditions are inherently subjective, leading to a lack of reliable tools for standardised diagnosis. For the same patient, on the same day, the perception of anxiety may differ both between different clinicians, and the same clinician at different times.
For these paediatric patients experiencing pain and discomfort in uncertain environments, the ICD-10 codes may be unable to perfectly capture the turbulent black-box of their minds.
This section required the greatest amount of data wrangling and cleaning. Certain assumptions were required for analysis.
Initial exploration found numerous patients, for whom a period of consecutive daily anxiety diagnoses would include random single days where a patient did not receive an anxiety diagnosis.
For example, if a patient had 100 days of consecutive anxiety, Day 50 may not have an anxiety code recorded. In analysis, this would be identified as two discrete periods of anxiety lasting 49 days and 50 days respectively, separated by an anomalous anxiety-day, rather than an extended period of 100 days.
It was assumed this was likely a mistake in clinical diagnosis coding, and so, the following algorithm was used to find and fill in missing diagnoses:
This data was then plotted using a histogram, showing the frequencies of different durations of anxiety.
Disease codes, where an F.41 anxiety code was also present on the same day, were summed to investigate the patients comorbidities.
N.B. F.41 codes were filtered out of the count.
N.B. R, Z codes were filtered out as they were "other"/"unspecified" codes, and had a signficantly greater sum.
At the most granular level of ICD-10s hierarchy, the top 3 co-diagnosed disease codes were:
When looking at the highest level, the most common codes were:
An overall sum frequency of each medications administration shows that Melatonin is by far the most commonly administered medication.
According to the BNFc, the indications for Melatonin relate to insomnia, particularly in the following patient groups:
These indications fit the most common comorbidities, alongside anxiety itself leading to sleep-related behavioural difficulties, and could provide a possible explanation towards Melatonins popular administration.
Below is a line chart showing the monthly percentage of HAA medications administered on the same day as a patient who was diagnosed with anxiety.
Most patients diagnosed with anxiety were not administered any of the HAA medications.
When stratified by drug class, Hypnotics and Anxiolytics administrations are far greater than Antidepressants. Both classes appear to remain at a stable occurrence over time.
The difference in frequency could be explained by the respective benefit : side-effect ratio.
For example, Melatonin is included within the Hypnotics and Anxiolytics class, which can be used in a diverse patient population, and holds few side-effects. Conversely, SSRIs such as Fluoxetine, rarely tend to prescribed in children, only for severe CAMHS disorders, and must be used with great care.
This analysis focussed only on recorded procedure tagged according to the OPCS-4 structure.
According to this hierarchy, codes X,Y,Z were deemed to be other codes: those which were ambiguous and non-specific, creating inaccurate duplicates for a single procedure.
For example, if a nephrectomy is performed, this may be tagged under the M.03 class (Partial excision of kidney) as well as Z.94.2 (Right sided operation).
Therefore, the X, Y, Z codes were filtered out of the anaylsis.
Procedures were rarely performed on the days patients were diagnosed with anxiety, according to this analysis.
When simply looking for the most common procedures, this analysis found the top 3 to be:
Utilising the 20 most common procedures found previously, a line chart and area chart were plotted to examine trends over time. Each procedure was grouped into its top level OPCS-4 hierarchy.
The U class is shown to make up the highest proportion of procedures, and trends upwards. The OPCS-4 groups the U codes as Diagnostic imaging, testing and rehabilitation.
The other classes of procedures appear to remain relatively stable over time, with the next most common being A (Procedures related to the Nervous System e.g. open biopsies of the brain).
This report was a preliminary exploration of two dimensions thought to be related to anxiety diagnoses: medication administrations and procedures.
There exist numerous avenues for development upon the initial findings of this report.
Currently, this data only looks at data from 2019 onwards. The primary reason for this relates to feasibility: earlier periods of data are messier and would require more time than allotted to explore.
Delving deeper into the findings, it is important to combine data across sections to reveal correlations.
Perhaps, there are particular procedures which increase perceived anxiety in a patient. As procedures are necessary, this anxiety may be treated with medications to acutely calm the patient. As the root-cause of anxiety has not been addressed, its duration will increase.
An alternative approach involves identifying anxiety-inducing procedures and focussing psychological resources to address the stress on childrens psyche, thereby demanding fewer medications, leading to improved patient outcomes and cost-effectiveness.
Machine learning methods can sometimes be used to improve clinical outcomes.
In this case, an appropriate example is an unsupervised clustering algorithm which identifies different groups of patients based on dimensions such as patient demographics, medications and procedures. Clinical early warning systems could utilise clusters to identify medication efficacy for patients with anxiety. More, they could warn against procedures which worsen anxiety duration, and long-term clinical outcomes.